{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\nQuick Start Tutorial for Compiling Deep Learning Models\n=======================================================\n**Author**: `Yao Wang <https://github.com/kevinthesun>`_, `Truman Tian <https://github.com/SiNZeRo>`_\n\nThis example shows how to build a neural network with Relay python frontend and\ngenerates a runtime library for Nvidia GPU with TVM.\nNotice that you need to build TVM with cuda and llvm enabled.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Overview for Supported Hardware Backend of TVM\n----------------------------------------------\nThe image below shows hardware backend currently supported by TVM:\n\n![](https://github.com/dmlc/web-data/raw/main/tvm/tutorial/tvm_support_list.png)\n\n     :align: center\n\nIn this tutorial, we'll choose cuda and llvm as target backends.\nTo begin with, let's import Relay and TVM.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\n\nfrom tvm import relay\nfrom tvm.relay import testing\nimport tvm\nfrom tvm import te\nfrom tvm.contrib import graph_executor\nimport tvm.testing"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Define Neural Network in Relay\n------------------------------\nFirst, let's define a neural network with relay python frontend.\nFor simplicity, we'll use pre-defined resnet-18 network in Relay.\nParameters are initialized with Xavier initializer.\nRelay also supports other model formats such as MXNet, CoreML, ONNX and\nTensorflow.\n\nIn this tutorial, we assume we will do inference on our device and\nthe batch size is set to be 1. Input images are RGB color images of\nsize 224 * 224. We can call the\n:py:meth:`tvm.relay.expr.TupleWrapper.astext()` to show the network\nstructure.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "batch_size = 1\nnum_class = 1000\nimage_shape = (3, 224, 224)\ndata_shape = (batch_size,) + image_shape\nout_shape = (batch_size, num_class)\n\nmod, params = relay.testing.resnet.get_workload(\n    num_layers=18, batch_size=batch_size, image_shape=image_shape\n)\n\n# set show_meta_data=True if you want to show meta data\nprint(mod.astext(show_meta_data=False))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Compilation\n-----------\nNext step is to compile the model using the Relay/TVM pipeline.\nUsers can specify the optimization level of the compilation.\nCurrently this value can be 0 to 3. The optimization passes include\noperator fusion, pre-computation, layout transformation and so on.\n\n:py:func:`relay.build` returns three components: the execution graph in\njson format, the TVM module library of compiled functions specifically\nfor this graph on the target hardware, and the parameter blobs of\nthe model. During the compilation, Relay does the graph-level\noptimization while TVM does the tensor-level optimization, resulting\nin an optimized runtime module for model serving.\n\nWe'll first compile for Nvidia GPU. Behind the scene, :py:func:`relay.build`\nfirst does a number of graph-level optimizations, e.g. pruning, fusing, etc.,\nthen registers the operators (i.e. the nodes of the optimized graphs) to\nTVM implementations to generate a `tvm.module`.\nTo generate the module library, TVM will first transfer the high level IR\ninto the lower intrinsic IR of the specified target backend, which is CUDA\nin this example. Then the machine code will be generated as the module library.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "opt_level = 3\ntarget = tvm.target.cuda()\nwith tvm.transform.PassContext(opt_level=opt_level):\n    lib = relay.build(mod, target, params=params)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run the generate library\n------------------------\nNow we can create graph executor and run the module on Nvidia GPU.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# create random input\ndev = tvm.cuda()\ndata = np.random.uniform(-1, 1, size=data_shape).astype(\"float32\")\n# create module\nmodule = graph_executor.GraphModule(lib[\"default\"](dev))\n# set input and parameters\nmodule.set_input(\"data\", data)\n# run\nmodule.run()\n# get output\nout = module.get_output(0, tvm.nd.empty(out_shape)).numpy()\n\n# Print first 10 elements of output\nprint(out.flatten()[0:10])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Save and Load Compiled Module\n-----------------------------\nWe can also save the graph, lib and parameters into files and load them\nback in deploy environment.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# save the graph, lib and params into separate files\nfrom tvm.contrib import utils\n\ntemp = utils.tempdir()\npath_lib = temp.relpath(\"deploy_lib.tar\")\nlib.export_library(path_lib)\nprint(temp.listdir())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# load the module back.\nloaded_lib = tvm.runtime.load_module(path_lib)\ninput_data = tvm.nd.array(data)\n\nmodule = graph_executor.GraphModule(loaded_lib[\"default\"](dev))\nmodule.run(data=input_data)\nout_deploy = module.get_output(0).numpy()\n\n# Print first 10 elements of output\nprint(out_deploy.flatten()[0:10])\n\n# check whether the output from deployed module is consistent with original one\ntvm.testing.assert_allclose(out_deploy, out, atol=1e-5)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}